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Abstract: There is a growing interest in the literature for adaptive Markov Chain Monte 
Carlo methods based on sequences of random transition kernels {Pn} where the kernel P„ 
is allowed to have an invariant distribution 7r„ not necessarily equal to the distribution of 
interest vr (target distribution). These algorithms are designed such that as n ^ oo, P,i 
converges to P, a kernel that has the correct invariant distribution tt. Typically, P is a 
kernel with good convergence properties, but one that cannot be directly implemented. It 
is then expected that the algorithm will inherit the good convergence properties of P. The 
equi-energy sampler of [15] is an example of this type of adaptive MCMC. We show in this 
paper, that the asymptotic variance of this type of adaptive MCMC is always at least as 
large as the asymptotic variance of the Markov chain with transition kernel P. We also show 
by simulation that the difference can be substantial. 
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1. Introduction 

Adaptive Markov Chain Monte Carlo (AMCMC) is an approach to Markov Chain Monte Carlo 
(MCMC) simulation where the transition kernel of the algorithm is allowed to change over time 
as an attempt to improve efficiency. It grows out of the seminal works of [11, 12]. Let tt be the 
distribution of interest. The problem is to sample efficiently from vr given a family of Markov 
kernels {Pq, 9 £ Q}. This can be solved adaptively using a joint process {(X„,,^„), n > 0} such 
that the conditional distribution of Xn+i given the information available up to time n is Pq^ and 
where On is adaptively tuned over time. Some general sufficient conditions for the convergence 
of such algorithms can be found in [6, 18]. It is also shown in [1] that under some regularity 
conditions, if a "best" limiting kernel Pq* exists, the marginal chain {Xn, n > 0} in the joint 
adaptive process behaves in many ways like a standard Markov chain with transition kernel Pg* . 
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/ On the efficiency of some adaptive Monte Carlo Schemes 2 

In all the above mentioned papers, the assumption that each Pq has invariant distribution vr plays 
an important role. 

More recently, interest has emerged in building Monte Carlo algorithms where the transition 
kernel Pn used at time n has invariant distribution 7r„ not necessarily equal to vr. These algorithms 
are designed such that as n — > oo, Pn converges to a transition kernel P which is invariant with 
respect to vr. This limiting kernel P is typically a very efficient kernel that would be difficult 
to implement otherwise. The interest of this approach is that as n — > oo, P„ approaches P and 
one expects the algorithm to inherit the good convergence properties of P. The Equi-Energy 
(EE) sampler of [15] is an example. Another example based on importance resampling appeared 
independently in [5] and [2]. 

This paper provides a detailed analysis of the law of large numbers and central limit theorem for 
the Equi-Energy sampler. It is also an attempt to address the question of whether such algorithms 
can deliver the same performance as their limiting kernel P. We give a negative answer. We show, 
in the case of the EE sampler, that its asymptotic variance is always at least as large as the 
asymptotic variance of the limiting transition kernel P. The difference can be substantial and we 
illustrate this with a simulation example. 

On the related literature, the law of large numbers for of the equi-energy sampler has been 
studied in [2] but using different techniques than those in this work. We also mention a new 
class of interacting MCMC algorithms proposed by [8, 10] for solving numerically some discrete- 
time measure- valued equations. These algorithms share the same framework with the equi-energy 
sampler. In these two papers, the authors develop a number of asymptotic results for interacting 
MCMC including a strong law of large numbers and a central limit theorem. 

The paper is organized as follows. In Section 2 we present the Equi-Energy sampler and IR- 
MCMC in a slightly more general framework. The limit theorems are developed in 3 and proved in 
Section 4. The main ingredient of the proofs is the martingale approximation method. We present 
a simulation example in Section 3.5 comparing these algorithms to a Random Walk Metropolis 
algorithm. 

2. A class of adaptive Monte Carlo algorithms 

Let {X^ B, A) be a reference Polish space equipped with its Borel a-algebra B and a cj-finite 
measure A and > 1 an integer. We denote by M the set of all probability measure on {X, B). 
Let {vr^'), / = 0, . . . , A'} be probability measures on {X,B) such that: 

7r(')((ix) = ^e^^'(^)A(dx), (1) 
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/ On the efficiency of some adaptive Monte Carlo Schemes 3 

for some measurable functions Ei : {X,B) M. Zi := J^^ e~^^^^^ X{dx) (assumed finite) is the 
normalizing constant. We study a class of Monte Carlo algorithms to sample from the family 
{vr^'^}. These algorithms will generated an ergodic random process on 
X^^^ with limiting distribution vr^'^) x • • • x vr^-'^). 

We introduce some notations in order to describe the algorithm. Whenever necessary and 
without further notice, any subset of will be equipped with its Borel cr-algebra. If (3^, £) and 
(Z, JF) are two measurable spaces, a kernel from {y, £) to {Z, J^) is any function P : yxj^ — > [0, 1] 
such that P{y,-) is a probability measure on {Z,J^) for all y £ y and P{-,A) is a measurable 
map for all yl G JT. If {y,£) = {Z,J^), we call P a kernel on {Z^T). If P is a kernel from (3^,<f) 
to [Z^ T), f : {Z, J^) ^ M. a measurable function and y G 3^, we shall use the notation P{y, f) or 
Pf{y) to denote the integral P{Vi dz)f{z) whenever it is well defined. 

2.1. A general algorithm 

Let {P^^\ Z = 0, . . . be kernels on {X,B) such that vr^'^ is the invariant distribution of P^^\ 
Let {r('), / = be kernels from {X'^,B'^) to {X,B), {w^), I = l,...,K] positive real- 

valued measurable functions defined on {X'^,B^) and 9i G (0, 1) for / = 1, . . . , K. For fi £ Ai and 
/ = 1, . . . , i^T, we define the following kernel on (X, B) 

Ppi.,A) = e,P^H^,A) + (1 - ^ /^^("^^''^^/^Jf^^^'^'-'^ .£X,A£B. (2) 

For n > 1, we introduce the maps Hn : A4x X ^ Ai defined as HnifJ-, x) = fi+n~^{6x — ft)-, where 
5x is the Dirac measure. Let{(x(°),...,x(^) , fin \ ■ ■ ■ , Hn n > 0} be the nonhomogeneous 

Markov chain on X^^^ x (defined on some probability space that can be taken as 

the canonical space [X^^^ x A4^)°°) with sequence of transition kernels Pn given by 

P„ ((xW, . . . , xW, . . . , ; (dyW, . . . , dy(^), dz.W . . . , d.^^-^))) 

=pW(xw,d,w)np(t4x(o,d/)) n'v(.(o,.(0)(d-(')). (3) 

1=1 1=0 

Throughout, we denote {J-^n, n > 0} the natural filtration of the process. We will assume that 
the initial value of the process is fixed. For simplicity we take /ig "* = 0. Finally, we call P and E 
the probability distribution and expectation of the process. 

Algorithmically, {{Xn^ X^^'^ , /x^f ^ , ■ ■ ■ , /^n^^ ^^), n > 0} can be described as follows. 

Algorithm 2.1. At time n and given {{X^\ . . . ,X^\iJ'^\ . . . "^^), k < n — 1}: 

1. Generate X^n^ ~ P^^j^W^^ 
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2. For I = 1, . . . , K , generate independently from i^^'Li) fx^'l^^, as given by (2). 

3. Forl = 0,...,K-l, set fil'^ = F„ (/i^'li, X^'^) = fifi, + n-^ (j^^) - P^U) 

The heuristic of the algorithm is the following. By construction, {Xn\j'ri\ is a Markov 
chain with kernel P^''^ and invariant distribution vr^'^^ If this chain is ergodic, then as n ^ oo, 
P [X^^ G A\J^ri-i) = P^[L)(^n-i,^), will converge to i^(^) where i^(') is given by 

K^i){x,A) = eiP^'\x,A) + {l-ei)^^ I 7r('-i)(dyV(')(x,y)r«(y,x,A), (4) 

where z^^\x) = 7T^^~^\dy)uj^^^ {x,y). We will discuss below two ways of choosing oj^^^ and T^') 
so that K^'^'> has invariant distribution vr^'). With these choices we can reasonably expect {Xn^} 
to be ergodic with limiting distribution vr^^^. The same argument can then be repeated. In other 
words, with appropriate choice of u^^^ and T^'^\ the marginal process {Xn\ n > 0} can be used 
for Monte Carlo simulation from tt^^\ 

2.2. Importance- Resampling MCMC 

For I = 1, . . . ,K define the importance function: 

r^^\x)=exp{Ei^i{x)- Ei{x)). 

In Algorithm 2.1 we can take io^''\x,y) = r^^\y) and T^^\y,x,A) = T^^\y,A) where rd'^ is 
some kernel on {X,B) with invariant distribution vr^'^. This lead to the IR-MCMC algorithm 
([•5], [2]). In this case. Step 2 of Algorithm 2.1 can be described as follows: with probability Oi 
we sample Xn^ from P^^\X^_]^,-) and with probability 1 — 9i, we obtain 1"*-'^ by resampling 
from . . .^X^J^Zi) with weights {r(') (X^'"^^), . . . ,r^^\X^J;Zi)] and then propose x1^ ~ 

The l-ih. limiting kernel here takes the form 

K^'\x,A) = eiP^'\x,A) + (1 - ei)T,^'\A), 

has invariant distribution vr^'^ and has better mixing than P^'^ . But direct sampling from K^^^ is 
impossible as it requires that we be able to sample from Tr^'-* which is the problem that we are 
trying to solve in the first place. 

2.3. The Equi-Energy sampler 

Taking u}^^\x,y) = 1 and 

r«(y, X, A) = min (l, '-M^ Uiy) + (l - min (l, U{x), (5) 
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/ On the efficiency of some adaptive Monte Carlo Schemes 5 
in (2), we get the EE sampler ([15]). In this case the limiting kernel becomes: 

K^'\x,A) = eiP^'\x,A) + {l-9i) f 7r('-'\dy)T<^'\y,x,A), 

Jx 

= diP^'Hx,A) + {l-ei)R^'\x,A), (6) 



where R^^^ is the kernel of the Metropolis-Hastings algorithm with proposal vr*-' and target 
distribution vr^'^: 



Uix). 



Clearly, K^^^ has invariant distribution vr'-'^ In general K^^^ will converge faster than P^^\ For 
example if Ei — Ei^i is bounded from below it is easy to show that K'^^'^ is always uniformly 
ergodic, independently of P^^\ 

For the EE sampler, Step 2 of Algorithm 2.1 can now be described as follows. With probability 
6i we sample Xn'^ from P^^\X^_i,-) and with probability 1 — 9i, we obtain Y^^^ by resampling 
uniformly from {^^' ""^^ : , k <n—l]. Then Y^^^ is accepted with probability min i 1, j 

in which case we set Xn'' = y^'); otherwise Y^^^ is rejected and we set Xn'' = ^^li- 

Actually the EE sampler described above is a simplified version of ([15]). Their original al- 
gorithm uses an idea of partitioning. Let {Xi, i = be a partition of X (in [15], 
Ei{x) = E{x)/ti and they take Xi = {x £ X : Ei^i < E[x) < Ei} for some predefined val- 
ues Eq < El . . . < Ed). Define the function I[x) = i if x G Xi] so Xk^^-^ represents the component 
of the partition to which x belongs. Now set uj^^\x,y) = lxj^^^{y) and T^^^ as in (5) and we get 
the EE sampler of ([15]). In this general case, the limiting kernel has the same form as in (6) 
but where R^^^ is now a Metropolis-Hastings algorithm with target distribution vr^'^ and proposal 
kernel Q^^\x,dy) oc 'K'^^~^\y)lxj^^,^){y)^{dy). Partitioning the state space and using the proposal 
Q^''\x,dy) oc TT^^~^\y)lxjf^^^{y)X{dy) works well in practice as it can allow large jumps in the 
state space to be accepted. But it does not add any significant feature to the algorithm from the 
theoretical standpoint. Therefore and to simplify the analysis, we only consider the case where 
no partitioning is used {Xj(^^) = X for all x £ X). 

3. Asymptotics of the Equi-Energy sampler 

For the remaining of the paper, we restrict our attention to the EE sampler. In other words, we 
consider the process defined in Section 2 with uj^''\x,y) = 1 and T^'^ as defined in (5). 



imsart ver. 2005/10/19 file: Eff2Rev2.tex date: November 2, 2009 



/ On the efficiency of some adaptive Monte Carlo Schemes 
3.1. Notations and assumptions 



6 



We start with some notations. If Pi,P2 are kernels on {X,B), the product P1P2 is tlie kernel 
PiP2{x,A) = Pi{x,dy)P2{y, A). If /i is a signed measure on {X,B), we write to denote 
the integral / ^{dx)f{x) and we will also use /x to denote the linear functional on the space of M- 
valued functions on {X^B) thus induced. Similarly, we will write ^Pi{A) for / fL{dx)Pi{x, A). Let 
V : X ^ 00) be given. For / : {X^B) — > M, we define its F-norm as \ f\y := sup^.g;^;. ^i^d 
we introduce the space Ly of measurable real- valued functions defined on X such that < 00. 
For a signed measure ft on [X^B) we define by := sup{|/i(/)| , / G Ly, \f\y < 1}. We 

equip M, the set of all probability measures on [X ,B), with the metric \\ft — fWy and the Borel 
cr-algebra Bj^ii^) induced by \\-\\y. Whenever V is understood, we will write {J^,Bm) instead of 
{M,BMiy))- For a linear operator T from {Ly , \-\y) into itself, we define its operator norm by 
|||r|||^:=sup{|r/|^, /GL^, |/|^<i}. 
We assume that vr'-'^ is of the form: 

7r(0(d2;) = ^e-^(^)/*'A(dx), (7) 
Zi 

for some continuous function E : {X ,B) that is bounded from below and ti > • • • > t/^ = 1 
is a decreasing sequence of positive numbers (temperatures). In addition, we make the following 
assumption. 

Assumption (Al): For I = 1,. . . ,K, there exist a set Ci C X , a probability measure (j)i such 
that 4>i{Ci) > an integer hq > and constants A/ G (0, 1), bi G [0, 00), £/ G (0, 1] such that for 
X € X and Ag B, 

r n rj.n 

{x,A)>ei(l)i{A)lc,{x), (8) 



and 

p('V(x) < A/y(x) + 6/ic,(x) , (9) 

where V{x) = ce'^^^^^ > 1 for some finite constants c > and k G (0, 1) and < k < — ■ 
Moreover 

^ <ei<i, i = i,...,K. (10) 



i + (i-Ao(K-i(tr'-ir-i)-i; 

Remark 3.1. 1. The drift and minorization conditions ((8)-(9)) of (Al) can be checked for 
many practical examples. If each P^'^ is a Random Walk Metropolis kernel or a Metropolis 
Adjusted Langevin kernel then (8) and (9) are known to hold under some regularity condi- 
tions on the energy function E (see [4, 13]). In these cases, it is always possible to choose k 
small enough to satisfy < k < f jj- — tTT 
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2. The condition (10) is a technical condition that quantifies the idea that the rate of resam- 
phng 1 — Oi should not be too large. It is needed to guarantee that the geometric drift 
condition (9) on P*^'^ transfers to kernels of the type pji^ that drive the EE sampler. 

3.2. Law of large numbers 

We consider an arbitrary pair {{Xn , Xn^), n > 0}. We will show that under (Al), if {Xn ^\ n > 
0} satisfies a strong law of large numbers, then so does {Xn\ n > 0}. Then we use the fact that 
{Xn \ n > 0} is an ergodic Markov chain to derive a law of large numbers for any {Xn\ n > 0}. 

Theorem 3.1. Assume (Al) and let (3 G [0, 1). Let f : {M,Bm) x i^i^S) -^M. be a measurable 
function such that 

sup \fu\v^ < oo- (11) 
Suppose that there exists a finite constant C such that for any z/, /i £ A^, 

\fu - f/ilvfi < C\\u - nWyp . (12) 

Suppose also that for any h G Ly^, 

-Y,HXI^~^^) — >7r^^-^\h), F-a.s.as n ^ oo, (13) 
k=i 

and that there exists T> £ J-', P(D) = 1 such that for each sample path u £ T>, f (i-i){x){uj) 
converges to /^{;-i)(x) as n ^ oo for all x £ X. Then 

1 ^ 

-5]/ a-i)(4')) ^7r«(/^,-i)), ¥-a.s.as n^^. (14) 
n , — ■ A'fc-i 

Proof. See Section 4.3. □ 

The following Corollary is then immediate. 

Corollary 3.1. Assume (Al) and suppose that {X^\ n > 0} is a (j)- irreducible aperiodic Markov 
chain with invariant distribution vr^*^) and -K^^^iV) < oo. Let f G Ly/s, P G [0>1)- Then for any 

le{i,...,K}, 

1 " 

a.s. as n ^ oo. (15) 

^ i=i 
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/ On the efficiency of some adaptive Monte Carlo Schemes 8 
3.3. Central limit with a random centering 

We now turn to central limit theorems. It can be shown that the kernel P^p admits a unique 

invariant distribution ■ Since the conditional distribution of Xn^ given J-n-i is it is 

natural to consider a central limit theorem for X]fc=i fi-^k^) ™ which f{xj!^) is centered around 
«-!)(/)• This is done in the next theorem. =^ denotes weak convergence and J\f{fi,a'^) denotes 
the Gaussian distribution on M with mean and variance o"^. 



Theorem 3.2. Assume (Al). Let f G L^^, (3 £ [0, 1/2) be such that 7r(')(/) = 0. Define 

a2(/):=vr«(/2) + 2 5] / 7r«(dx)/(x) f{x), (16) 

k=i-''^ 

where K^^^ is given by (6). Assume that crf{f) > 0. Then there exists a random sequence {TTn\f)}, 
''^rPif) — > T^^^\f) (almost surely) as ?i —> oo such that: 

n 

Y: [fix!^)-4\f)] ^MiO,l) as n^oo. (17) 



Proof. See Section 4.4. □ 
3.4. Central limit theorem with a deterministic centering 

We now derive a central limit theorem for J2k=i fi-^k^) around TT^^\f) which gives more insight 
in the efficiency of the method as a Monte Carlo sampler from vr^'-*. We restrict ourselves to the 
case where 1 = 1] that is we only consider the pair {{X^\X^^), n > 0}. Moreover we assume m 
this section that X is a compact subset of (equipped with its Euclidean metric) . More precisely 

Assumption (Al'): X is a compact subset ofW^. For / = 0, 1, there exist an integer uq > 0, a 
constant £1 G (0, 1] a probability measure 4>i such that for x £ X and A £ B, 

'p'^''>r\x,A)>eiMA). (18) 



Let C{X,M) be the space of all continuous functions from A" ^ M. We endowed C{X,M.) with 
the uniform metric |/|oo := sup^^^^; \ f{^)\ and its Borel cj-algebra. Let Lip(A',M) be the subset of 
Lipschitz functions of C{X,M) (we say that f : X ^ M. is Lipschitz if there exists C < 00 such 
that for any x,y £ X, \ f{x) - f{y)\ < C\x - y\). 

For / : A" — > M bounded measurable, define the function 

U{x) = Uf{x) :=E(^!;o))' /(^), 
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/ On the efficiency of some adaptive Monte Carlo Schemes 9 

the solution to the Poisson equation for / and -P^(d)- To simphfy the notations, we omit the 
dependence of U on /. Notice that P^d) is the hmiting kernel in the EE sampler, denoted K^^^ 
in (6). Clearly, (Al') implies as shown in Lemma 4.1 below that the kernel pji^^ is also uniformly 
ergodic, uniformly in fi. In particular \U\oo < oo. We assume that the function U is Lipschitz 
whenever / is Lipschitz: 

/ G Lip(A:',M) implies that ^ {P%y f G Up{X,R). (19) 

We comment on (19) below. Let / G C{X,M.) such that TT^^\f) = 0. Consider the partial sum 
Sn = J2k=i fi-^k^^)- Since U satisfies the Poisson equation U — P|:[o)C/ = /, we can rewrite Sn as 

n 

Sn = EU{Xi'^)-P>{Xi'^ 
k=l 

= M^ + ± P%UiX;p) - P« f/(X«) 

k=l 

whereMn = ELi C^(^i^V^^!o) f^(^i-i) is a martingale and ei^^ = P^;ic/(xi^V^^TO^(^n^^)- 
We introduce the function 

F,(y) :=rW(2/,x,[/)-i?W(x,C/) = J T^^\y,x,dz)U{z)- j 7T^°\dy) J T^^Hy , x , dz)U (z) . (20) 
Since P^,^^ (x, dz) = OiP^^^ (x, dz) + (1 - ^i) / ^(dy) / r(i) (y, x, dz) , we have 

p(i)C/(x) - P%U{x) = (1 - 9,) J fJi{dy)KM. 
so that we can rewrite Sn as 

5. = M„ + (1 - 0i) ^ - 5: H (Xf) + = M„ + (1 - 6,) + 4'^ 

where ?7ri is the random field 

n 
k=l 

We will see that is a C(^, M)-valued random element. To describe its asymptotic behavior we 
introduce the function 

where for a kernel Q, QHx{y) = J Q{y,dz)Hx{z) and the covariance function 



T{x,y) = I [u^\z)U^^\z) - (Pm^\z)) (pW[/f(z))] 7rW(dz). 



(21) 
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fi9 ^ C(rV,M), with an abuse of notation we will also write T{f,g) for the quantity 
nf,9) = [ [uf\z)U('\z) - (pWf/f (.)) (P(°)t/f (.))] n(^Hdz), 
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where U^^\x) = J2j>o 



p(0) 



Theorem 3.3. Assume (Al'), (19) and suppose that E G Lip{X,W). Let f G Lip{X, 
that 7r(^)(/) = 0. Then 



1 " 

-r E ^ ^ (O' ^l(f) + 4(1 - oifm-9) 

v " k=l 

where g{-) := J 'K^^\dx)Hx{-) and 



as n 



oo, 



such 



(22) 



fc=i 



(23) 
□ 



Proof. See Section 4.5. 

Notice from (20) that g{-) = f T:^^\dx)T^^\-,x,U) - J n^-^^dz) J Tr(-^\dx)T^'^\z, x,U). Thus 
Theorem 3.3 shows that the asymptotic variance of the EE sampler is the sum of the asymptotic 



variance in estimating TT^^\f) as if the limiting kernel P^IL is known (the term o"^(/)) plus 



(1) 



(0) 



the asymptotic in using the chain {Xn , n > 0} to estimate the expectation under vr^''^ of 
the function / Tr^^\dx)T^^\-,x,U). In their analysis [8] arrive at a similar CLT for interacting 
MCMC algorithms. Notice also that U{x) = J2j>oiP%y fi^)- Thus in most cases, the function 
/ ir^^\dx)T^^\-,x,U) will typically take large values and the asymptotic variance in estimating 
its expectation will also tend to be large particularly if the kernel P^*'^ mixes poorly. Theorem 3.3 
thus suggests that for the EE sampler to be effective in practice it is important that the initial 
chain {X^\ n > 0} enjoys a very fast mixing. 



A remaining question is to know whether n 



(eLi/(4' 



6i)^r{g, g). Unfortunately the answer is no in general as shown by the following example. 



converges to cr^(/) + 4(1 



Proposition 3.1. Assume (Al'). Suppose that P^^^ = P(^) = P and n^^^ 
f : X be a bounded measurable function such that 7r(/) = 0. Then 



vr 



(1) 



vr. Let 



lim n E 

n— >oo 



\k=l 



alif) + 2(1 -e^fTi-g,-9)■ 



In the present case g{x) = U{x) = Z]j>o ^'^^ 

oo „ 

^lif) = vrd/n + 2Y.0\j 7r{dx)f{x)P'f{x). 



k=l 
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Proof. See Section 4.6. □ 

Remark 3.2. Assumption (19) can often be easily checked. Indeed we have U{x) = f{x) + 
Pj^o) U {x) where -P[|d) = OiP^^^ + (1 ~ Oi)R^^'^ , where i?^^^ is the independent Metropohs-Hastings 
algorithm with target vr^^^ and proposal vr^^^ . Let us assume that P^^^ is also a Metropolis-Hastings 
kernel with target vr^^^ and proposal q{x,y). Denote a{x,y) (resp. a{x,y)) the acceptance proba- 
bility of p(^) (resp. R^^^), and denote a{x) := / a{x,y)q{x,y)dy (resp. a{x) := / a{x,y)-K^^\y)dy) 
the average acceptance probability at x for P^^) (resp. for R^^^). Then we have 

U{x) (1 - Oiil - a{x)) - (1 - 0i)(l - d{x))) 

= fix) + 91 J a{x,y)q{x,y)U{y)dy + {l-9i) J a{x,y)7:^^\y)U{y)dy. 

Thus if and q such that a and a remains bounded away from and the integral oper- 

ators /i — > / a{x,y)q{x,y)h{y)dy and h ^ J a{x,y)7r^^\y)h{y)dy transform bounded measurable 
functions into Lipschitz functions, then (19) hold. For example if and q are all positive 

on X and of class then (19) hold. 

Remark 3.3. The result developed above relies heavily on the Lipschitz continuity assumption. 
Under that assumption, we show that the stochastic process {r]n, n > 0} lives in the Polish space 
C{X,M) which allows us to use the standard machinery of weak convergence in Polish spaces. If 
/ is only assumed measurable the theorem above no longer hold. But a similar result can still 
be obtained using weak convergence techniques in non-separable metric spaces. But we do not 
pursue this here. 



3.5. An illustrative example 



Consider the following example. Suppose that we want to sample from the bivariate normal 
distribution M (0, S), with covariance matrix 

0.96 2.44 
2.44 7.04 

For this problem, we compare a Random Walk Metropolis (RWM) algorithm; the EE sampler; 
the MCMC algorithm based on the limiting kernel of EE sampler (call it limit EE sampler); IR- 
MCMC; and the MCMC algorithm based on the limiting kernel of IR-MCMC (limit IR-MCMC 
sampler) . 

For the RWM sampler, the proposal kernel is M (0, 12), where I2 is the 2-dimensional identity 
matrix. For the adaptive chains, we use four chains with vr'^^-' = vr^/^^, vr^^^ = vr^/^, vr'^^^ = vr^/^ 
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and vr^^) = vr. We take 6i = 6 = 0.5 and P^') is taken to be a RWM algorithm with target 
TT^^^ and proposal AA(0,/2). It can be checked that assumption (Al) hold for this problem. We 
simulate each of the five samplers for = 10, 000 iterations. We compare the samplers on their 
Mean Square Errors (MSE) in estimating the first two moments of the two components of the 
distribution vr. We calculate the MSEs by repeating the simulations 100 times. The results are 
reported in Table 1. 

From these results we see (as expected) that the limit EE sampler is 3 to 25 times more 
efficient than the RWM sampler; and the limit IR-MCMC sampler is 15 to 50 more efficient than 
the RWM sampler. But IR-MCMC itself is hardly more efficient than the RWM sampler. If we 
take the computation times into account, it becomes hard to make the case that any of these 
adaptive sampler is better than the plain RWM. Similar conclusions can be drawn for the EE 
sampler. 







E(Xi) 


E(X2) 


mi) 


E(X|) 


RWM 


MSE 


0.0099 


0.0803 


0.0091 


0.5525 


Ratios 


1.0 


1.0 


1.0 


1.0 


IR-MCMC 


MSE 


0.0098 


0.0774 


0.0047 


0.2962 


Ratios 


1.00 


1.04 


1.95 


1.87 


limit IR-MCMC 


MSE 


0.0002 


0.0017 


0.0006 


0.0296 


Ratios 


48.43 


46.20 


14.18 


18.66 


EE 


MSE 


0.0057 


0.0435 


0.0045 


0.2810 


Ratios 


1.74 


1.84 


2.02 


1.97 


limit EE 


MSE 


0.0004 


0.0030 


0.0034 


0.1966 


Ratios 


25.99 


26.36 


2.67 


2.81 



Table 1 

Mean square error and ratios (with respect to the RWM sampler) for IR-MCMC, limit IR-MCMC, EE and limit 
EE. Based on 100 replications o/ 10, 000 iterations of each sampler. 



4. Proofs 

4.1. Preliminary results on kernels of the form P^^^ 

For a probabihty measure v and I = 1, . . . , K , lei Pu^ as in (2) with w*^'-' = 1 and T*^') as in (5). 
The following lemma shows that Pu^ satisfies a drift and a minorization conditions with constant 
that actually do not depend on v. 

Lemma 4.1. Assume (Al). Then there exists \[ G (0, 1) that does not depend on v such that for 
X £ X and A £ B: 

{x,A)>eieiUA)lc,{x), (24) 
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and 

Pl}W{x)<\[V{x) + hilcM) , (25) 
where Ci, 4>i, hi, £i and V are as in (Al). 

Proof. We have Pu^ > 9iP^^\ Therefore (24) follows from the minorization condition (8). 
Define 6i = {t^^ - t^}^ - l) .We will show that 

v{dy)T^'\y,x,V) < {l + 6i)V{x). (26) 
Given the drift condition (9), this will imply: 

p^vix) < {eiX + {i-ei){i + 6i))v{x) + biicXx) 

< X[Vix) + bilc,ix), 

where A; = 9i\ + (1 - 6i){l + Si) G (0, 1) by the condition on k in (Al). 

Observe that r(')(x) = e~^^''^^^'^'~^'^-^\ t^^ - tj^ > and V{x) = ce'^^(^) > I, n £ (0, 1). This 
implies that r^^\y) / r^^\x) > 1 if and only if E{y) < E{x). Denote A{x) = {y e X : E{y) < E{x)} 
and TZ[x) = {y ^ X : E[y) > E[x)}. Then we have: 



u{dy)T^'\y,x,V) = f u{dy)T'^'\y,x,V)+( u{dy)T'^'\y,x,V) 

Ja{x) Jn{x) 



A(x) 



u{dy)V{y) + I u{dy)'^^j^V{y) + V{x) [ u{dy) ( 



1Z{x) 



rW(x) 



v{dy)V{y) + V{x) / v{dy) + / y{dy)'^ij^ 



lA(x) 

< V{x) + V{x) I u{dy) 



n{x) 
r^'Hy) fViy) 



{V{y) - V{x)) 



1 



V{x) 



1 + 



n{x) 



n{x) ' r(0(x) \V{x) 



□ 



In the last line we use the following inequality: for < x < y: e ^(e^ — l)<x/{y — x). 

From Lemma 4.1, we deduce that for any probability measure u, Pu^ has an invariant distri- 
bution TTu^ such that 

4\V)<bi. (27) 

See [17] (Theorem 15.0.1), Theorem 14.3.7. The lemma also implies that for any (3 G (0, 1], there 
exist constants Cp < oo and Pj3 G (0, 1) that does not depend on such that: 



p(0 



<Cpp)iV^{x), k>Q,xeX. 



(28) 
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See e.g. [7] for a proof. The following lemma hold. 

Lemma 4.2. Fix (3 G [0, 1] and fi and v two probability measures on {X,B) 



14 



VI3 ■ 



pm_p{l) <2\\ri-iy\ 
Proof. For / G Lyp such that {flyp < 1, we have 

- = (1 - 01) [ T^'\y,xJ) {^i{dy) - u{dy)) 



(29) 



where T^^\y, x, f) = min (l, {f{y) - f{x)) + /(x). Therefore 



(1 - 9i)VP{x) 
Now for \f\y, < 1, 



VP{x)Vf^{y) 



Vf{y) ifi{dy) - u{dy)) . 



1 

' r(')(x) 



mm 



< 2 for all X G X. Therefore 



V^iy) ifiidy) - u{dy)) 



< 2 sup 

I/Iv/3<1 



2 11/^ - l^lly^ • 



□ 



For / G {1, . . . , K}, define the kernel 



= / Kdy)f{y) min ll, '-^^ 



, X £ X. 



Lemma 4.3. Let and be a probability measure on {X,B). For xi,X2 G X, and f G Lyp, 
(5 G [0, 1] 



Ni^f{x,)-NPf{x,) <\f\y, 
with T = 1/ti — 1/ti^i and k as in (Al). 

Proof. Fix xi and X2 and define A(?/) = Vl^{y) 
max(rW(xi),rW(x2)), A(y) = 0. On r(')(xi) < rW(y) < rW(x2): 



min ( 1 



rW(y) 
r(0(xi) 



min ( 1 



r(0(y) 
rW{a;2) 



(30) 
On r(')(?/) > 



A(y) = y^(y)(^l 



rW(x2); 



^ g-{r-K/3)£;{j/) {^^rE{y) _ ^TE[x2y 
< (^e^C^'l^i) _ (,rE(x2))^ ^-(T-Kl3)E(y) _ 
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Similarly, on r^^\y) < min(r('^(xi), r(')(a;2)), 
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grE(xi) _ ^tE{x2) ^-{T-Kf})E{y) 



Putting the three parts together yields the lemma. □ 

Remark 4.1. Lemma 4.3 will be useful in deriving a uniform law of large numbers for {Xn^^. 
Actually, this lemma shows that if the function E is continuous then the kernel N^p is a strong 
Feller kernel that transforms a bounded function / into a continuous bounded function A'^^''' 
(uniformly in /x). We will use this later. 



4-2. Poisson equation 

A straightforward consequence of 4.1 is that for any / £ Ly^;,, [3 S (0, 1] the function: 
is well defined and 



fc=o 



where C is finite and does not depend on u nor /. Uy^ f satisfies the (Poisson) equation 



(31) 



(32) 



(33) 



Lemma 4.1 and 4.2 implies that for all /3 G (0, 1], and fi, v probability measures on {^X^B\. 

<C\\^-v\\y,- (34) 



7r« - 7r« 



for / G 



and 



(35) 



(36) 



The inequalities (34), (35) and (36) can be derived for example by adapting the proofs of Propo- 
sition 3 of [•')]. We omit the details. An important point is the fact that the constant C (whose 
actual value can change from one equation to the other) does not depend on / nor i/, fj,. 
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4.3. Prove of Theorem 3.1 
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Let / : {J^,Bm) X ('^I'B) ^ M be a measurable function. We will use the notation //^(x) when 
evaluating /. We introduce the partial sum associated to {Xn\ n > 0}: 



k=l 



k=l 



k=l 



(0 



Using the Poisson equation (33), we have the decomposition 



sliHf) = E ^^ii) ( ) + MiHf) + <\(/) + <k/), 



fc=i 



(0 



,(0 



(37) 



m(')(/) = ^d«(/), 



k=l 



where 



and 



f.^^ H A*fe /^fc A*fc_i A*fc_i ^fe-i 



{0^ 



Lemma 4.4. 



sup sup E(y(xg ^V(^?)) <oo- 



l<l<Kk,k'>Q 

Proof. This is a straightforward consequence of the (uniform in v) drift condition on Pi 
Lemma 4.5. Let p > 1 such that pP < 1. There exists a finite constant C such that 



(0 



E 



<2(/) <C(logn)^ 



Moreover n '^R^n2{f) converges ¥-almost surely to 0. 
Proof, we use (36), (32) and (11) to obtain: 



□ 



Pl,)C/(i,)/(.-i)(X 
Mfe ^fc 



k ) 



p(0 TriO f (Y(^h 
Mfe_i /^fe-i 



< C sup l/J^ 



1, v'nx. 



(0^ 



y/3 



(38) 
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H-n H-n-l 



VP 



sup 



In view of Lemma 4.4 and since pP < 1, E 



< c 



for some finite constant C that does not depend on n. Therefore given (38) and (11), we can use 
Minkowski's inequality to conclude the first part of the lemma. 

For the second part, by Kronecker's lemma, it is enough to show that the series 



k>l 



converges almost surely. This will follow if we show that 



fc>i 



is finite. But from the above calculations, we have seen that 

\ f^k l^k >^k-l 

The lemma thus follows. 

Lemma 4.6. Let p > 1 such that f3p <\. Then 



< Ck-^. 



□ 



supE i^i'U/) 



< oo. 



Moreover for any 5 > ^, 



Pr 



sup 

m>n 



> 5 



0, as n ^ oo. 



Proof. The first part is a direct consequence of(ll) and (32). For the second part, by Markov's 
inequality, we see that 



Pr 



sup 

m>n 



> 6 



m>n 



< Cd^P J2 "^"^ as n ^ oo. 



m>n 



□ 
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Lemma 4.7. Let p > 1 such that pP < 1. There exists a finite constant C such that 
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E 



M«(/) ^1 < C'n"^ax{i,p/2)_ 



Proof. By Burkeholder's inequality applied to the martingale {Mn\f)}, we get: 

V p/2' 



E 



< CE 



E HUf) 

\k=l 



If p > 2, we apply Minkowski's inequality and use (32) to conclude that 

p/2 



E 



M«(/) 



<C<E 



.k=l 



If 1 < p < 2, we use the inequality (a + 6)" < a" + 6" valid for all a,b > 0, a & [0, 1] to write 



E 



MPif)\'] < CE [Y:yk\f)\"j 

n 

< Cj2E{vP^{X^^l,))<Cn. 



k=l 



□ 



To deal with the remaining term, we will rely on the following result which is also of some 
independent interest. 

Lemma 4.8. Let p,,fii,... be a sequence of probability measures on a measurable space {X,B) 
such that fini^) f^{^) fof all A £ B and let f,fi, ... be a sequence of measurable real-valued 
functions defined on {X,B) such that sup„ \ fn\v < ^'^^ fn{x) f{x) for all x £ X for some 
measurable function V : {X,B) — > (0,00) such that p{V) < 00 and sup„/x„(y°) < 00 for some 
a > 1. Then 

lim Pnifn) = Kf)- 

n — *oo 

Proof. By [19] (Chap 11, Proposition 18) we only need to prove that /in(F) fJ-iV)- By [19] 
(Chap 11, Proposition 17), we already have p^iV) < liminf„^oo Mn(^)- Now we show that 
limsup„_>oo /i„,(y) < p{V) which will prove the lemma. 

Since V > 0, there exists a sequence of nonnegative simple measurable functions {Vn} that 
converges increasingly to V p-a.s. For A; > 1, > 1, define Ek^N = {x £ X : V{x) — Vp{x) > 
^, for some p > N}. Clearly, E^^n G B and fi{Ek^]\f) — > as ^ 00 for any A; > 1. Fix k,N > 1. 
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Then for any n > 1 and any p > N, we have: 



19 



f^niV) = UniVp) + flniV - Vp) 

= l^niVp) + [ Unidx) {V{x) - Vp{x)) 

< f^niVp) + [_ Hnidx)V{x) + i 



+ 



finidx) (Vix) - Vp{x)) 



'k.N 



1 



< MVp) + C{fin{Ek,N))'+T^ 



(39) 



with q = 1 — 1/ a for some finite constant C. The last inequahty uses the inequahty of Holder and 
the assumption that sup„/i„(y") < oo for some a > 1. Since Vk is simple, fJ-n{Vk) /"(^fc)- Also 
fJ-niEf^^^) — > fi{Ei^^^). With these and letting n — > oo and j9 — > oo in (39), we have by monotone 
convergence: 

limsup^„(F) < fi{V) + C {fi{Ek,N)y + 7- 
Letting N ^ oo and then /c ^ oo, we get limsup,„^o^ f^niV) < ^(^)- D 
Lemma 4.9. t^^^(i-i) (f (i-i)) ^0 as n —> oo to^/i P probability one. 

Proof. To simplify the notations, we write 7r^'\ P^'^ and /„ instead of 7r^9;_;^, , -P^a.n and / (i-i) 

Mil Mil 

respectively. For x S A:", and n, m > 1, we have: 



4'^(/n)-vr^'^(/.a-i)) 



(0^ 



< 



+ 



(x)-vr«(/^,_,)) 



< 2 sup \My,CpVf'{x)p'^ 



+ 



(40) 



using (28). We will show next that there exists T>o G JF, with Pr(Po) = 1 such that for each path 
u S Vq, (^Pn^^ fn{x){uj) Converges to {^K^^^^ f^(i-i){x) as ?i ^ oo for all x ^ X, all m > 0. 
Then, going back to (40), we can conclude that for each uj £ Vq, 

limsup vr«(/„) -vr(')(/^a-i)) {u) < 2CpVf'{x)p^ 



and the proof will be finished by letting m oo. 
We can rewrite Pn\x,A) as: 

Pi'\x,A) = eiP^'\x,A) + (1 - ei)N^\x,A) + (1 - ei)lA{x) (l - N^\xJ) 



where iVn ^(x. A) = f fLn{dy)lA{y) min ( 1 



^' r(')(a:) 



and Nn\x, I) = / Pn{dy) min ( 1 



r(')(y) 
^' r(')(a:) 
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By the law of large numbers assumed for {Xn ^\ n > 0}, and since {X,B) is Polish, there 
exists a dense countable subset C in X, a, countable generating algebra Bq of B and D £ J^, 
¥{!)) = 1 such that for all a; G C and ah A £ Bq: 

N^^\x,A) ^ N^^\x,A), asn^oo, (41) 

N^\x,I) ^ N'-^^x,!), asn^oo. (42) 

We can also choose V such that the convergence of fn{x){uj) to f^{i-i){x) for all x £ X which 
is assumed in the theorem hold for all to £ D. li we fix a sample path to £ D, and we fix a; G C, 
the convergence in (41) can actually be extended to all yl G ;B by a classical measure theory 
argument. Also, again for lu £ D and A £ B fixed, we can extend the convergence in (41-42) to 
hold for all X G X. To see why, take x £ X arbitrary. Lemma 4.3 and the continuity of E implies 
that N^(x,A) is a continuous function of x uniformly in ^. Since C is dense, for all k > 1, there 
is Xk £ C such that: 



NjlHx,A)-N(^Hxk,A) 



1 



for ah fi. In particular, N^!\x,A) > Ni^\xk,A) - l/k for all n > 1. As n — > oo, it follows that 
liminf„^oo Nn\x, A) > N^^i_^ {xk,A) — l//c. As/c^oo, by the continuity of -/V^'('!_i)/(-) (Lemma 
4.3), we see that liminf„^oo Nn\x, A) > N^^i_-^^ (x. A). Similarly, we obtain limsup^^g^ Nn\x, A) < 
ivf,Li) (x, A). So that lim^^oo Ni^\x, A) = ivf2_,) {x, A). Similarly, lim„_oo N^^Hx, I) = ivfj_,) {x, I). 

This shows that for each sample path a; G P, Pn\x,A) converges to iCW(x, A) for allx £ X 
all A £ B. By a successive application of Lemma (4.8) (with F = 1), we can therefore conclude 
that for each sample path uj £ D: 

(p^'))"'(x,A) ^ (i^('))™(x,^), as n ^ oo for all x G G e,m > 0. (43) 

P/l ) V{x) is uniformly bounded in fi and m, 
we can apply Lemma 4.8 again to conclude that for each u £ V, [Pn ) fn{x) converges to 
(^K^^^^ /^(i-i)(x) for all x £ X, all m> 0, which ends the proof. 

□ 



4.3.1. Proof of Theorem 3.1 

We are now in position to prove Theorem 3.1. Since (3 £ [0, 1), we can take p = 1//9 in Lemma 
4.5 and Lemma 4.6 to conclude that Rf\{f)/n 0, P-a.s. for i = 1,2 and by the strong law of 
large numbers for martingales (['•]), we conclude that Mn\f)/n — > 0, P-a.s. We finish the proof 
using Lemma 4.9. 
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4.4. Proof of Theorem 3.2 
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Take p = 1/(3 > 2 (since (3 G [0, 1/2)). By the martingale approximation (37) 



5P(/)-E-!(U(/, 



k=l 



t'k-l 



As above, we will simplify the notations by writing 'Kn\fn) instead of vr*" ( / 



.(0 



larly for ui!\pP etc... 



,{'-!) 



and simi- 



By Lemma 4.5-4.6, E Rii\f) 



= 0((log(n))P). We then deduce that Rl\f)/^ 4 and 
it remains to show that a central limit theorem hold for the martingale {Mn\f),J^n}. We need 
to show that the Lindeberg condition hold: 



1 



-V E 



k=l 



and that 



iEE[(4'))^(/)|^,_, 



0, for all e > as n — > 00. 



(44) 



(45) 



k=l 



. bmce sup„ < 00 for j5 > 2, it follows 



where C72(/) = vr(/2) + 2 E^i ^^'^ 
that the Lindeberg condition (44) hold. 

For the law of large numbers, we need some notations. Let C/^'^ denote the fundamental kernel 
of the limiting kernel K^^'^ and define the functions An\x) = Pn'^ {ulP^ f{x) and /siif' {x) = 

Pii^uil^ f{x)^ . Simularly, define /\^^\x) = K'^^^ (u^^^y f{x) and A(2)(x) = [i^(0[/(0/(x) 
Then we can rewrite: 



k=i 



k=l 



(0 



.(0 



.(0 



(2) 



n 



k=l 



y/j. We have seen in the proof of Theorem 3.1 that T^n\f) converges almost surely 



Fix / G L 

to TT^^\f). Combined with (43) and using dominated convergence it follows that there is 2? G 
Pr(D) = 1 such that for all sample path u; G P, Un^f{x) converges to U^^^ f{x) for all 
X ^ X. By virtue of Lemma 4.8, it follows that for all u; G P, /!\n\x) converges to A(J)(2;) 
for all X G j = 1,2. Then the strong law of large numbers (theorem 3.1), implies that 

which 



^Y.l=i^{{Dl^?{f)\^k~i) converges almost surely to 7r« (k« / - [i^Wf/^')/ 

is equal to a'^{f) = 7r«(/2) +2Y°Zi ^^^^ IfiK^^^Yf 
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4.5. Proof of Theorem 3.3 
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We continue with the notations of Section 3.4. 

Lemma 4.10. Under the assumptions of Theorem 3.3, there exists a finite constant cq such that 
|r(xi, x) — r(3;, x)| < col^i — x|, for all x,xiGA'. 



Proof. Given the expression of r in (21), it is enough to show that \U^\y) — Ux['{y)\ < co|x — xi|. 
But since 



Ui'Hy)-U(P^Hy) 



^[P(0)]'(i/.(y)-i?.,(y)) 
i>o 



(where for a kernel P with invariant distribution tt, P = P — tt), the lemma follows if we show 
that there exists a finite constant cq such that for any xi,X2,y G X, 

\Hx^{y) - Hx^{y)\ < cq\xi - X2I. 

It is easy to check as in Lemma 4.3 that for any xi,X2,y G X, 



-rE{y) 



+ 



\Hx,{y) - HxM\ < 2|f/(^i) - U{x2)\ + \U\oo I e 
Now the result follow from (19), the Lipschitz assumption on E and the compactness X. □ 

Proposition 4.1. Under the assumptions of Theorem 3.3, rjn converges weakly in C{X ,M) to a 

mean zero Gaussian process G with covariance function T and sample paths in C{X,M) and 



E sup \G{x) 



< 00. 



(46) 



Proof. The existence of G and the bound (46) follows from Lemma 4.10 and Dudley's Theorem on 
the existence of Gaussian processes with continuous sample paths (see e.g. [16] Theorem 6.1.2). 
Indeed, if dr{x,y) := {T{x,x) + T{y,y) — 2T{x,y))^^'^ denotes the pseudo-metric associated to F, 
Lemma 4.10 implies that dr{x,y) < ^/2co\x — y|^/^ and since X is compact, this in turn implies 
that J\f{X, dr,e) < (^Ke^^)'^^'^ for some finite constant K, where J\f{X, dr, •) is the metric entropy 
of X under dr ■ 

We now show that r/„ converges weakly in C{X,M) to a mean zero Gaussian process with con- 
tinuous sample path and covariance function T. Indeed, the convergence of the finite-dimensional 
distribution is given by the standard central limit for uniformly ergodic Makov chains. We use a 
moment criterion to check that the family {?]„, n > 0} is tight ([1 I] Corollary 16.9). It suffices to 
check that 
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(i) For some xq G {Vnixo), n > 0} is tight. 

(ii) For some positive finite constant a, 6, cq, 

E [\rin{xi) - ^n{x2)\"'] < Co\xi - X2f^'', for all xi,X2 £ X,n>0 
The condition (i) is trivially true. To check (ii), we use the resolvent ui^^ to write Hx^iy) — 

H.M = (u^fiy) - u^fiy)) - (p^'^^dfiy) - P^'^u^'iHy)). it follows that 



rjnixi) - rin{x2) = Mn{xi,X2) + e„(xi, 2:2), 

where M„,(x„X2) = ELi {u^hxf) - )) - {P^'^U^hxfl,) - (41)) and 

e„(xi,X2) = PWC/i?(4°)) - P(°)C/i?(4°^) - P(°)C/i°)(Xr) - PWC/i?(rf). 

The term M„(xi, X2) is a martingale and en{xi,X2) is bounded in n by a constant. By Burkholder's 
inequality and some additional straightforward arguments it follows that for any a > 2 

E [|r?„(xi) - r?„(x2)n < - ^^1^ < C\x, - X2V . 

Then it suffices to take a > d. □ 

We will also need the following simple result. 

Lemma 4.11. If {xk} is a sequence of real numbers such that Xn ^ as n ^ 00 then 
n~^^^ ELi k-^^^Xk ^ as ^ 00. 

Proof. Take e > 0. Let uq > 1 s.t. n > uq implies \xn\ < £• Then for n > uq, n~^/^| X]fc=i k~^^'^Xk\ < 
^'^^^ fc-^/^fcl + ^"^/^ Efc=no+i < ""^^^ Efcil k-^^^\xk\ + 2e. Letting n ^ 00 and 

e — > yields the result. □ 

Proof of Theorem 3. 3. For the rest of the proof, let G be a mean zero Gaussian process on X with 
covariance function F and almost surely continuous sample paths. We take G independent from the 
process {{x!^\x^n^), n > 0}. From the Gaussian process G, we define 7r(G) := f G{x)TT^^\dx) 
as follows. For each sample path w G il, if Gi^{-) is continuous then 7r(G)(a;) = / 'n^'^\dx)Gu){x). 
Otherwise, we set 7r(G)(w) = 0. Since / — > TT^^\f) is a continuous map from C(A',]R) — > M, 
7t^^\G) is a well defined random variable. 

Back to the partial sum S„, we have seen that 



5„ = M„ + (1 - 9i) J2 fe~'/'r?fc(4'^) + 
fc=i 



where M„ := ELi ^^(4^') " P («) ^(4-\) and e^n^ = (p (o)f/(X^'0 " P rnUiX^'')]. Clearly 



sup 

n>l 



P(o)C/(xS^V^^(o)f^(4'^)) 



<c, 
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thus the term ^ is neghgible. That is 



Sn = Mn + {l-ei)Y,k-^l^r^k{X^^^) + op{V^) 



k=l 

n 



M„ + (1 - 0i) Y: ^G{XI^^) + (1 - 0i) Y: - + op{V^). 

k=l 



k=l 



In the above, we denote op{n^) any random variable X„ such that n ^Xn converges in prob- 
abihty to zero. To deal with the term J2k=i k'^'^ (jlnix'^^) - we use the Skorohod 

representation of weak convergence. First note that 



n 



k=l 



< n-1/2 V sup \ri^{x) - G(3 



k=l 



By the Skorohod representation theorem, there exists a version G of G and a version {fjn, n > 
0} of the random process {r/„, n > 0} such that sup2,g_:^;. fjn{x) — G{x) — > a.s.. Therefore by 
Lemma 4.11, ^"^2 ^^^^ /^-i/a 

sup^.g;^;' — G'(x) converges almost surely and thus in proba- 

bility to zero. It follows that n'^/"^ Z]fe=i ^"^^^ {j1n{X^^) — G{X^'')^ converges also in probability 
to zero. We thus arrive at 

1 



k=l V 



To deal with the term ELi 7^G'(xf we introduce Vo = and Vk = Ej=i{G{X^ 



7r«(G)). 
k=l V 



n , 

Y {Vk- Vk-i) 
k=i 

"1 { I I \ "1 



^1 ^ 
1 " 

-^^" + E 



14- 



fc=2 



k=2 

Vk-i 



We deduce that 

-1/25^ = n-i/2Af„ + (1 - 0i)7r(i) (G)n- V2 |- ^-1/2 + ^-ly^ 



1 



k-1 



k-l 



n 



k=l 

n 



/c=2 



- 1 



+ op (1) . 
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For almost every path lj G 17, G^{-) is a continuous function from A" — > M. Therefore, by the in- 
dependence assumption and the law of large numbers of Theorem 3.1, Ej=i — vr^^) (G) 

converges in to zero. Using Lemma 4.11 again, we conclude that -k= 'TJl-'} — t ^ , t^Vz,-_i 

converges also in to zero. The term X]fc=i k"^/"^ converges to 2. We thus arrive at 

n-^I^Sn = n-^Hln + 2(1 - 0i)^(i)(G) + op(l). 

Proceeding as in the proof of Theorem 3.2, we see that -^Mn converges weakly to Z , where 
Z ^(0,cr^(/)) and is independent from G. We thus conclude that n~^/'^Sn converges weakly 
to Z + 2(1 — ^i) / 7T^^\dx)G{x), where Z and / Tr^^\dx)G{x) are independent. 

Since / — > '7r^^^(/) is a continuous bounded function from C{X,M.) — > M, it follows from the 
above that TT^^\r]n) converges weakly to tt^^^G). But TT^^\r]n) = n~^/^Y.k=i!^^^\dx)H^{xf'^). 

.(0) 



By the central limit theorem for the uniformly ergodic chain {X„ , n > 0}, the latter term 
^'^'"^111=1 S T^^^\dx)H^{xf'^) converges weakly to N{{),T{g,g)), where g{-) = ^ ^^^\dx)H^{-) 
and we are finished. □ 



4-6. Proof of Proposition 3.1 

Proof. In the present case, one can check that U{x) = Z]j>o(-^f(o))''/(^) = Ej>o 

Hx{y) = U{y). Then the resolvent function U^^ becomes U^^ (y) = U'^^^ (y) = J2j>o P'^U{y) which 

allows use to write E?=i H^ixf) = M^^^ + , where Mf ^ = E?=i " 

and ef = PU^^\X^^^) - Pt/(o)(xf ^). Thus we have: 

n 

Sn = Mn + (1 - 0i) k-^M[!'^ + en, 
k=l 

where = ei, 

+ Efc=i^"^4°^- The term Cn is negligible and is suffices to study the limit of 



E 



Mn + (1 - Oi) ^ fc-^Aff 
V fe=l / 



E 



(m2) + (1 - 



E 



k 



k=l 

+ 2(l-ei)E 



Define D(°)(x,y) = U^°\y)-PU^^\x) and D'-^^x , y) = U{y)-PU{x). It is easy to see that for any 
i,j > l,E(^D^^\xl^\,xl°^)D^^\x!j%X^^^)^ = 0. From which we deduce that E [m„S^=iA:-1M; 
0. 



(0)1 
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We write ELi k~Hlf^ = YJU Y.l=j k"^D^^\xf\,xf^) and since the terms D^^\xf\,X. 



(0) ^(0)^ 



^(0) ^(0)^ 







are martingale differences, we get 



E 



Vfc=l / 



E 



n n 



n I n 



f f n j n 

j TTidx) J P(x,(iy)(Z)(0)(x,2/))2^ 



j=l \k=j 



bmce is a bounded continuous function and {Xn^ } is uniformly ergodic, the second term 
on the rhs divided by n converges to zero. Then we notice that lim^^oo J27=i (^k=j = 2 

and we conclude that 



lim Eln^^S'^ 



TTidx) I P(x,dy){(Z)«(x,y))2+2(l-0i)2(Z)(o)(x,y))2} 



□ 
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